Back

Computational Biology and Chemistry

Elsevier BV

Preprints posted in the last 90 days, ranked by how well they match Computational Biology and Chemistry's content profile, based on 23 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
Minimal Amino Acid Alphabet for Protein Design

Pubal, K.; Kushnir, K.; Spiwok, V.; Louzecka, K.; Setnicka, V.; Lipovova, P.

2026-03-06 bioinformatics 10.64898/2026.03.06.710107 medRxiv
Top 0.1%
4.1%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWProteins are built from 20 canonical amino acids. It is interesting to explore whether proteins can be formed from significantly reduced amino acid alphabets. Our bioinformatics survey of UniProt (more than 250 M sequences) revealed that proteins composed of reduced amino acid alphabets (< 10) are extremely rare among existing proteins. Next, we used computational protein design to design proteins composed of all 1,013 possible alphabets of 2-10 early amino acids (Ala, Asp, Glu, Gly, Ile, Leu, Pro, Ser, Thr, and Val). The length of all proteins was 100 amino acid residues. Small amino acid alphabets preferred simple helices or helix bundles. Larger amino acid alphabets allowed for the design of more complex structures. A protein composed of 8 amino acids (Ala, Asp, Gly, Leu, Val, Ser, Thr, and Pro) was successfully experimentally verified. It belongs to fibronectin type III domain {beta}-sheet-rich architecture. Attempts to experimentally verify designs composed of 6 and 4 amino acids were unsuccessful. We show by a computational experiment without an experimental validation that inverse folding programs, namely ProteinMPNN, can stabilize designed proteins within the same amino acid alphabet. Our results show that globular proteins may have formed early in evolution. Furthermore, we show that it is possible to design proteins with interesting properties for biotechnology and synthetic biology.

2
Influence of molecular representation and charge on protein-ligand structural predictions by popular co-folding methods

Bugrova, A.; Orekhov, P.; Gushchin, I.

2026-02-18 bioinformatics 10.64898/2026.02.18.706547 medRxiv
Top 0.1%
2.4%
Show abstract

Recently developed deep learning-based tools can effectively generate structural models of complexes of proteins and non-proteinaceous compounds. While some of their predictive capabilities are truly exciting, others remain to be thoroughly tested. Here, we probe whether the ligand input format (Chemical Component Dictionary, CCD, or Simplified Molecular Input Line Entry System, SMILES) and charge (which depends on protonation) will affect the results of the predictions by four popular algorithms: AlphaFold 3, Boltz-2, Chai-1, and Protenix-v1. We chose methylamine and acetic acid as two of the simplest titratable chemicals that are omnipresent in proteins as amino and carboxy moieties, and are consequently ubiquitous in the Protein Data Bank models that are most commonly used for training. Unexpectedly, we found that for both molecules, in many cases the input format affected the prediction results, and did it much stronger compared to protonation, whereas changes in the formally specified charge of the molecules did not lead to changes in binding expected from experiments. We conclude that (i) ensuring identical results irrespective of input formats and (ii) inclusion of protonation-related steps into training and prediction pipelines are the two available paths for improvement of protein-ligand structure prediction algorithms.

3
Structure-Based TCR-pMHC Binding Prediction and Generalization to Unseen Peptides

Abeer, A. N. M. N.; Roy, R. S.; Qian, X.; Yoon, B.-J.

2026-02-23 bioinformatics 10.64898/2026.02.21.707231 medRxiv
Top 0.1%
2.0%
Show abstract

The interaction between T-cell receptors (TCRs) with the peptide-bound major histocompatibility complex (MHC) intricately impacts the functional specificity of T-cell-mediated adaptive immune response. Consequently, implication in immunotherapy has contributed to the ever-growing computational methods for TCR recognition, which have recently attracted structure-based approaches due to advancements in protein structure modeling. Despite access to structural information of the predicted binding interface, graph neural network (GNN)-based TCR-pMHC binding specificity classifiers tend to show poor accuracy for samples with unseen peptides. In this work, we comprehensively assess the potential factors that critically impact the generalization performance of classifiers trained with computationally predicted structures. Specifically, our experiments focus on analyzing the sensitivity of such predictors to the interaction features in the TCR-pMHC interface and the structural uncertainty. Building on the analysis, we demonstrate how the design of classifier architecture with auxiliary training objectives can improve the generalization performance to novel peptides not yet seen during model training. Overall, our work highlights the challenges of unseen peptide generalization from different perspectives of the GNN-based classifier paradigm, showcasing the strengths and weaknesses of the current state-of-the-art approaches in the generalization landscape.

4
Characterizing Highly Conserved Fragments in 3'UTRs via Computational and Transfer Learning Approaches

Ho, E. S.; Baeck-Hubloux, A.; Dinh, N.; Severino, A.; Troy, C.

2026-01-20 genomics 10.64898/2026.01.19.700376 medRxiv
Top 0.1%
1.7%
Show abstract

3 untranslated regions (3 UTRs) serve as regulatory platforms that modulate translation, mRNA localization, and stability through the binding of regulators, such as RNA-binding proteins (RBPs) and miRNAs, in a sequence-specific manner. These vital binding sites are often identified through orthologous regions among species. A separate but related discovery is the ultraconserved elements (UCEs) detected in human, rat, and mouse genomes two decades ago. However, our knowledge about their functions is limited. Perplexingly, alterations in UCEs in mouse embryos can still produce viable progeny with no observable phenotypic differences. The majority of UCEs are non-coding, though [~]8% are located in the 3UTRs. Given the importance of 3UTRs in gene regulation, we use a computational approach to identify highly conserved fragments (CFs) in 3UTRs across diverse mammals, applying criteria appropriate for 3UTRs (250 bp and 290% identity). Results show that they are not composed of simple repeats or low-complexity regions common to mammalian genomes. Using a transformer-based foundational genomic model, CFs are characterized as A and T-rich and distinguishable from the 3UTR background. 36 human CFs from 25 genes are significantly depleted in variations in humans. They are enriched in neuronal tissues and play roles in neurodevelopment and RNA processing, mediated by RBPs and miRNAs. Our findings expand on existing studies that attribute UCEs primarily to enhancer function, suggesting a new path to explore the biological roles of UCEs in 3UTRs. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=139 SRC="FIGDIR/small/700376v1_ufig1.gif" ALT="Figure 1000"> View larger version (30K): org.highwire.dtl.DTLVardef@39727forg.highwire.dtl.DTLVardef@18c0374org.highwire.dtl.DTLVardef@136b784org.highwire.dtl.DTLVardef@14a5146_HPS_FORMAT_FIGEXP M_FIG C_FIG Created in BioRender. Ho, E. (2026) https://BioRender.com/dcyrx5f

5
Protein Language Modeling and Evolutionary Analysis Reveal an N-terminal Determinant of Functional Divergence in Cytochrome P450s from Sophora. tonkinensis

Qiao, Z.; Wang, J.; Qin, B.; Wei, F.; Liang, Y.

2026-03-07 plant biology 10.64898/2026.03.06.710024 medRxiv
Top 0.2%
1.4%
Show abstract

O_LIThe N-terminal signal sequences of plant cytochrome P450 enzymes are recognized as critical determinants for subcellular localization and functional diversification, yet their evolutionary drivers and mechanisms remain largely unresolved. C_LIO_LIIn this study, the evolutionary trajectories of these signals were systematically decoded through the integration of the protein language model ESM-2 with phylogenetic and selection analyses. A conserved functional fingerprint was identified. This region may serve as the essential endoplasmic reticulum targeting signal and be evolutionarily decoupled from adjacent surfaces under positive selection during lineage-specific expansions. C_LIO_LIA functional-adaptive decoupling model is proposed to explain this pattern, wherein a conserved functional core is maintained while surrounding interfaces diversify. This evolutionary architecture is interpreted as the outcome of a two-step cycle: an initial phase of positive selection driving functional innovation, followed by pervasive neutral evolution that facilitates structural exploration and potentiates future adaptations. C_LIO_LIThis work demonstrates how interpretable machine learning can be integrated with evolutionary theory to reconcile neutralist and selectionist perspectives on protein evolution. A novel framework is thus provided for understanding the layered evolution of protein modules, where structural constraint, adaptive innovation, and neutral drift operate on distinct tiers to generate functional diversity. C_LI

6
Predicting unknown binding sites for transition metal based compounds in proteins

Levy, A.; Rothlisberger, U.

2026-02-03 bioinformatics 10.64898/2026.01.29.702545 medRxiv
Top 0.2%
1.3%
Show abstract

Transition metal based compounds are promising therapeutic agents, particularly in cancer treatment. However, predicting their binding sites remains a major challenge. In this work, we investigate the applicability of two tools, Metal3D and Metal1D, for this purpose. Although originally trained to predict zinc ion binding sites only, both predictors successfully identify several experimentally observed binding sites for transition metal complexes directly from apo protein structures. At the same time, we highlight current limitations, such as the sensitivity to side-chain conformations, and discuss possible strategies for improvement. This work provides a first step toward establishing a robust computational pipeline in which rapid and low-cost predictors are able to identify putative hotspots for transition metal binding, which can then be refined using more accurate but computationally demanding methods. Author summaryTransition metals play a crucial role as therapeutic agents, especially in cancer therapy. However, the prediction of their binding site locations is challenging, as accurate computational methods often require time-consuming simulations, making them impractical when many possible binding sites must be explored. In this work, we explored the capability of two binding site predictors, originally developed to locate metal ions in proteins, to identify binding sites for more complex covalently-bound transition metal based agents. We found that these tools can often identify the experimentally-known binding regions, even when starting from the apo structure, in which the protein does not already contain the metal compound. At the same time, our results show clear limitations in more challenging cases, particularly when the binding involves only a single amino acid or when the binding site undergoes major structural rearrangements upon binding. Overall, our study shows that fast predictors can provide valuable early insights in the investigation of the binding sites of covalently-bound transition metal based compounds. When combined with more accurate simulation techniques, they can help focus computational efforts and ultimately support the rational design of transition metal based drugs.

7
Information Leakage in Enzyme Substrate Prediction

Atabaigi Elmi, V.; Joeres, R.; Kalinina, O. V.

2026-03-01 bioinformatics 10.64898/2026.02.26.708291 medRxiv
Top 0.2%
1.3%
Show abstract

Enzymes are essential catalysts in many cellular processes. Understanding their interactions with small molecules, such as regulators, cofactors, and most importantly, substrates, is crucial for understanding the biochemical processes that occur in cells. Correctly interpreting the roles of small molecules that interact with enzymes is key to elucidating enzyme function. Recently, the field of enzyme-small molecule interaction prediction has gained more interest from computational and, especially, deep-learning methods, and numerous datasets and models with remarkable performances have been published. In this work, we critically examine one of the most popular datasets and three models trained on it, identifying leaked information that may overinflate reported model performance. We show that the inspected models are susceptible to information leakage, and their performance drops to near-random when the leakage is removed.

8
CrossAffinity: A Sequence-Based Protein-Protein Binding Affinity Prediction Tool Using Cross-Attention Mechanism

Guan, J. S.; Wang, Z.; Mu, Y.

2026-02-23 bioinformatics 10.64898/2026.02.22.707318 medRxiv
Top 0.2%
1.3%
Show abstract

Protein-protein binding affinity is important for understanding protein interactions within a protein complex and for identifying strong drug-peptide binders to a target protein. Many structure-based models were built previously with reasonable performance. However, such models require protein complex structure as input, which is usually unavailable due to high cost and experimental constraints. To tackle such an issue, the sequence-based CrossAffinity model was constructed in this study, using the cross-attention module to extract contextual information of interacting protein components while separating the protein complex into two distinct parts to predict the protein-protein binding affinity. CrossAffinity managed to outperform all structure-based models and sequence-based models in an S34 test set containing newer protein complex structures and binding affinity values in a timeline while being trained on an older dataset, showing generalisability to new data points. In other test sets, namely S90, S90 subset and S79*, CrossAffinity also managed to outperform all other sequence-based models while maintaining comparable performance to many recently published structure-based models. The acceptable performance and quick inference of CrossAffinity enable it to be deployed in situations requiring the prediction of the binding affinity of many protein complexes that lack structural information.

9
Comparative Transcriptomic Analysis of ATRA-Resistant and ATRA-Sensitive APL Cell Lines Identifies LncRNA Biomarkers Associated with Drug Resistance

Marimuthu, O.; Shinde, N.; Sella, R. N.

2026-01-30 cancer biology 10.64898/2026.01.27.702191 medRxiv
Top 0.2%
1.3%
Show abstract

Acute promyelocytic leukemia is a distinct subtype of acute myeloid leukemia characterized by the t(15;17) translocation, leading to the PML (Promyelocytic leukemia protein)-RARA (Retinoic Acid Receptor Alpha) fusion protein. Although PML-RARA fusion is common, there are 20 more fusion events also reported in APL. All -trans retinoic acid (ATRA) is a standard drug for APL, leading to significant improvement in patient outcomes; nevertheless, a small fraction of patients still experience relapse, and some patients exhibit resistance to the drug. Long non-coding RNAs (LncRNAs) are recognized as promising biomarkers for cancer diagnosis, prognosis, and treatment response. In this study, we used ATRA-Resistant (AP1060) and ATRA -Sensitive (NB4), both treated and untreated cell line transcriptomic data retrieved from the NCBI Gene Expression Omnibus(GEO) database to perform transcriptomic analysis with bioinformatic tools. We utilized the LncRAnalyzer pipeline to predict the lncRNAs, followed by differential expression analysis using DESeq2. Weighted Gene Co-expression Network Analysis (WGCNA) was employed to construct lncRNA co-expression modules associated with ATRA resistance. BEDTools is used to identify cis-acting target genes of lncRNAs.LncRNA -miRNA sponging identified by miRanda algorithm. The identified miRNAs reveal their significant role in APL and other leukemia subtypes. The results of the study show that the identified lncRNAs from the miRNA-LncRNA network are promising biomarkers for ATRA resistance.

10
Computational insights into the interaction between Topoisomerase I and Rpc82 subunit of RNA Polymerase III in Saccharomyces cereviseae

Nandi, P.; Kamal, I. M.; Chakrabarti, S.; Sengupta, S.

2026-02-03 bioinformatics 10.64898/2026.01.31.703072 medRxiv
Top 0.3%
1.2%
Show abstract

The process of DNA transcription leads to the generation of torsional stress, which must be resolved for smooth progression of the transcription machinery. In Saccharomyces cerevisiae, DNA topoisomerase I (Top1), a type IB topoisomerase, plays a critical role in relaxing supercoils and mitigating the topological strain associated with transcription. While several proteins from the transcription machinery have been reported to interact with yeast Top1, detailed characterization and functional relevance of these interactions have remained underexplored. This gap is partly due to the absence of a complete three-dimensional structure of the full-length enzyme, which hinders structure-based computational analyses of its interactome. In this study, we present a template-based model of full-length yeast Top1. Leveraging this model, we investigated its molecular interaction with Rpc82, a key subunit of RNA polymerase III enzyme, responsible for transcribing small non-coding RNAs such as tRNAs and 5S rRNA. Through molecular docking and molecular dynamics simulations, critical residues at the Top1-Rpc82 interface were identified that likely mediate their interaction. Our findings provide new insights into the structural basis of Top1s association with RNA polymerase III and its potential role in regulating Pol III-mediated transcription. The Top1 model developed here offers a valuable framework for future in silico studies aimed at elucidating the broader interactome and regulatory mechanisms of this essential enzyme.

11
The Role of Human-Specific lncRNA in Hyaline Cartilage Development

Osone, T.; Takao, T.; Takarada, T.

2026-02-18 bioinformatics 10.64898/2026.02.17.706478 medRxiv
Top 0.3%
1.0%
Show abstract

One of the distinctive characteristics of humans is their bipedalism. To achieve upright bipedal walking, the angles of the pelvis and femur have been altered. Although evolutionary hypotheses on the transition to bipedalism exist, the molecular mechanisms remain unclear. This study attempts to elucidate these mechanisms using a system for inducing hyaline cartilage-like tissue from human iPS cells via limb bud like mesenchymal cells. Focus was placed on non-coding RNAs, known for their potential in generating biological diversity. Bulk RNA sequencing was conducted to compare the expression and functions of human-specific long non-coding RNAs between limb bud like mesenchymal cells and induced hyaline cartilage-like tissue. The results indicated that human-specific lncRNAs, significantly upregulated in hyaline cartilage-like tissue, may regulate genes related to the extracellular matrix. These findings suggest the potential to develop regenerative cartilage tissue with enhanced ECM quality through controlling human-specific lncRNAs. Additionally, studying human-specific lncRNAs could elucidate mechanisms of diseases that are less common in other species but more prevalent in humans.

12
Analysis of Age-Specific Dysregulation of miRNAs in Lung Cancer Via Machine learning: Biomarker Identification and Therapeutic Implications in Patients Aged 60 and Above.

Hasan, A.; Muzaffar, A.

2026-02-14 bioinformatics 10.64898/2026.02.12.705605 medRxiv
Top 0.3%
0.9%
Show abstract

Lung cancer is the leading cause of cancer-related mortality worldwide, predominantly affects older individuals, with non-small cell lung cancer (NSCLC) comprising 85% of cases. Despite advancements in diagnosis and treatment, prognosis for elderly patients remains poor. This study investigates the role of microRNAs (miRNAs) involved in lung cancer, focusing on individuals aged 60 and above. RNA sequencing data from The Cancer Genome Atlas (TCGA) was used to conduct differential expression analysis of miRNA profiles from elderly and senile patient groups. Results showed that out of 1,881 miRNA profiles, 801 were found to be differentially expressed. Filtering for significance identified that 25 miRNAs, with hsa-mir-1911 upregulated and 24, including hsa-mir-196a and hsa-mir-323b found to be downregulated. Studies showed that these miRNAs play roles in apoptosis, senescence, and inflammation. Another Experimental approach in this study, used Machine learning analysis which highlighted key miRNAs, including hsa-mir-181b, hsa-mir-542, hsa-mir-450b, hsa-mir-584, and hsa-mir-21 as crucial in lung cancer biology. Moreover, Functional enrichment analysis revealed their involvement in gene silencing, translational repression, and RNA-induced silencing complex (RISC) regulation. This research identifies the association of miRNAs and aging in lung cancer and finds potential biomarkers that can be helpful in early diagnosis and targets for personalized therapies.

13
Identifying Convergent Therapeutic Targets and Pathways for Post-Traumatic Stress Disorder, Schizophrenia And Bipolar Disorder via In Silico Approaches

Khan, M.; Rahman, F.; Nishu, N. A.; Hossain, M. A.

2026-02-28 bioinformatics 10.64898/2026.02.26.708243 medRxiv
Top 0.3%
0.9%
Show abstract

ObjectiveThe objective of this study is to provide a concise overview of the various molecular problems and possible treatment targets that have been linked and associated with the onset of certain psychiatric diseases. MethodsObtaining the data from NCBI, we applied GREIN to analyze our datasets. The protein-protein interaction, gene regulatory network, protein-drug-chemical, gene ontology, and pathway network were constructed using STRING, Funrich and DAVID libraries. In order to display our suggested network, we utilized Cytoscape and R studio, verifying our hub gene using roc analysis. ResultsWe discovered a number of strong candidate hub proteins in significant pathways, namely out of 32 (HLA-DRA, HLA-A, HLA-B, HLA-DOB and BRD2) common genes. We also identified a number of TFs (FOXC1, NFYA, RELA, GATA2, FOXL1, SRF and NFIC); miRNA (hsa-mir-129-2-3p, hsa-mir-148b-3p, hsa-mir-196a-5p, hsa-mir-26a-5p, hsa-mir-27a-3p, hsa-mir-23b-3p, hsa-mir-500a-3p, hsa-mir-423-5p, hsa-miR-142-5p, and hsa-miR-671-5p) and chemicals (Estradiol, Antirheumatic Agents, Valproic Acid, Selenium, Vitamin E, ICG 001, Ifosfamide, Tetrachlorodibenzodioxin, arsenic trioxide, entinostat, sodium arsenite and Hydralazine) may control DEGs in transcription as well as post-transcriptional expression levels. ConclusionIn summary, our computational methods have identified distinct potential biomarkers that demonstrate the impact of PTSD, Schizophrenia, and BD on autoimmune inflammation and infectious diseases. Additionally, we have identified pathways and gene regulators through which these psychiatric disorders may affect biological processes. Graphical AbstractThe graphical abstract demonstrates the thorough strategy of combining systems biology and computational technologies to identify significant markers and pathways in blood tissues impacted by post-traumatic stress disorder, Schizophrenia, and Bipolar disorder. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=152 SRC="FIGDIR/small/708243v1_ufig1.gif" ALT="Figure 1"> View larger version (48K): org.highwire.dtl.DTLVardef@1cd13bforg.highwire.dtl.DTLVardef@cb6392org.highwire.dtl.DTLVardef@f634cforg.highwire.dtl.DTLVardef@532bd_HPS_FORMAT_FIGEXP M_FIG C_FIG

14
SERPINA3 and NDRG1 are critical diagnostic immune genes associated with macrophages in preeclampsia

Wu, Z.; Chen, s.; Chen, w.; Xie, Y.; Zhou, Z.; Huang, L.; Sheng, L.; wang, y.; Chen, b.; Yang, c.; Ke, Y.

2026-02-10 immunology 10.64898/2026.02.09.704892 medRxiv
Top 0.4%
0.8%
Show abstract

ObjectiveThe immune system plays a role in the occurrence and progression of numerous pregnancy complications, particularly preeclampsia (PE). This study aims to identify critical immune biomarkers via machine learning and assess their predictive ability. MethodsGene expression data were retrieved from the GEO database, while immune-related genes were obtained from the ImmPort repository. Differential expression analysis was then conducted to identify immune genes associated with PE. Different immune-related genes (DIRGs) were subjected to functional and pathway enrichment analysis. We adopted protein-protein interaction (PPI) networks for exploring the connections among various DIRGs and integrated two machine-learning to pinpoint candidate biomarkers in PE. Diagnostic performance was assessed via ROC curve analysis, with predictive accuracy further quantified using nomogram calibration. Findings were validated through integrated computational and experimental analyses. In silico validation utilized additional GEO datasets, while experimental confirmation involved qRT-PCR and IHC assessment of placental tissues. We developed a nomogram to predict PE utilizing two immune-related genes. Cellular composition was inferred from transcriptomic data using CIBERSORT deconvolution.. ResultsWe identified 66 differentially expressed genes (DEGs) and 10 DIRGs between PE pregnancies and normotensive pregnancies. The GO analyses revealed that the DIRGs were enriched in gonadotropin secretion, the regulation of gonadotropin secretion, and the regulation of endocrine processes. Functional annotation revealed enrichment in cytokine and neuroactive ligand-receptor pathways. SERPINA3 and NDRG1 emerged as top-performing biomarkers (training AUCs: 0.812 and 0.866; external validation: 0.795 and 0.781), with overexpression validated in clinical specimens. Both genes inversely regulated M2 macrophage abundance (P < 0.05). ConclusionPE is fundamentally an immune-mediated disorder. SERPINA3 and NDRG1 can be identified as key immune genes associated with M2 macrophages, and these findings provide novel perspectives for the diagnosis and pathogenesis of PE.

15
Transcriptomic Analysis Reveals Inflammatory and Metabolic Dysregulation in Unexplained Female Infertility

PATIAL, R.; Ray, S.; Singh, K.; Sobti, R. C.

2026-01-26 bioinformatics 10.64898/2026.01.24.701467 medRxiv
Top 0.4%
0.8%
Show abstract

Infertility is a complex condition affecting both the male and female population. Influenced by multiple factors, it remains a constant challenge due to limited understanding of endometrial abnormalities. With this study we aim to investigate the molecular basis of infertility using transcriptomic analysis of endometrial tissue from the NCBI GEO dataset GSE92324. We performed exploratory data analysis using Principal Component Analysis (PCA) to find samples variance followed by differential gene expression (DGE) analysis using DESeq2 package where we identified 168 significant genes with adjusted p-value < 0.05 and |log2FC| > 2. Upregulated genes included GPX3, CXCL14, and PPARGC1A and downregulated genes included WNK4, GJB2, and TRPM6. Functional enrichment using KEGG and GO showed that differentially expressed genes (DEGs) are involved in immune-inflammatory pathways, lipid metabolism and steroid biosynthesis pathways. Through Ingenuity Pathway Analysis (IPA) we identified affected canonical pathways such as increased innate immune responses, altered lipid metabolism and inhibition of mitochondrial dysfunction. Upstream regulator analysis highlighted PTEN, PRKAA1, HDAC4, IL10RA, and RAD51, which were impacting metabolic pathways and anti-inflammatory signalling. Further, through Weighted Gene Co-expression Network Analysis (WGCNA) we found a Turquoise module that had very strong and highly significant negative correlation (cor = - 0.84, respectively and P < 0.0001) with traits of interest. This led to the discovery of C7orf50 as a novel insight involved in cholesterol metabolism linked to infertility. This integrative approach reveals crucial genes, co-expression modules, and underlying pathways involved in female infertility. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=139 SRC="FIGDIR/small/701467v1_ufig1.gif" ALT="Figure 1"> View larger version (41K): org.highwire.dtl.DTLVardef@4418a6org.highwire.dtl.DTLVardef@ae7900org.highwire.dtl.DTLVardef@89f581org.highwire.dtl.DTLVardef@154f1a9_HPS_FORMAT_FIGEXP M_FIG C_FIG HIGHLIGHTSO_LIFrom the dataset GSE92324 total of 168 significant DEGs associated with unexplained infertility were identified using adjusted p-value < 0.05 and |log2FC| > and < 2. C_LIO_LIIn comparison with the CTD list we identified five genes C1orf106, C15orf59, LINC00461, C15orf48, and C10orf99 previously unknown as having direct evidence of involvement in infertility. C_LIO_LIWGCNA analysis highlighted the turquoise module as highly associated and gave the novel gene C7orf50 associated with cholesterol metabolism. C_LIO_LIIPA revealed PTEN, PRKAA1, IL10RA, and RAD51 as potential upstream regulators and inflammatory pathways, mitochondrial dysfunction as canonical pathways. C_LIO_LIThe study highlights a novel link between GI inflammation and endometrial receptivity. C_LI

16
Inhibition of miR-1307 Reverses Resistance to Cisplatin in Drug-Resistant Oral Squamous Cell Carcinoma

Patel, A.; Patel, V.; Lotia, S.; Patel, K.; Mandlik, D.; Tan, J.; Sampath, P.; Patel, B.; Johar, K.; Bhatia, D. D.; Tanavde, V.; Patel, S.

2026-04-09 cancer biology 10.64898/2026.04.06.709730 medRxiv
Top 0.5%
0.8%
Show abstract

BackgroundChemo-resistance remains a major clinical challenge in Oral Squamous Cell Carcinoma (OSCC), attributed to the intrinsically resistant cells. Although tumour-derived extracellular vesicles (EVs) have been implicated in cell-cell communication, their role in propagating chemo-resistance remains poorly defined. This study aims to identify salivary EV-associated miRNAs capable of predicting chemoresistance and to delineate the role of miR-1307-5p in modulating CSC-driven therapeutic refractoriness. MethodsSalivary EV-derived expression profile of miR-1307-5p was assessed by qPCR in chemo resistant OSCC patients and further validated in TCGA small RNA sequencing datasets. Expression was validated by qPCR and correlated with clinicopathological outcomes. Functional assays including cell-cycle analysis, apoptosis, migration/invasion, 3D spheroids, angiogenesis, and CAM assays were performed in miR-1307-5p inhibited CD44 CSC subpopulation compared to its vehicular control. Transcriptomic profiling cross-referencing with TCGA was conducted to identify potential novel targets of miR-1307-5p. Chemo-sensitisation was assessed by treating the knockdown chemo resistant cells with low dose cisplatin and validating it using in-vitro functional assays and orthotopic xenograft model. ResultsmiR-1307-5p was significantly elevated in salivary EVs of chemo resistant OSCC patients and correlated with poor overall survival (p = 0.03). The miRNA was markedly enriched in endogenously resistant CD44 CSCs. Silencing of miR-1307-5p induced G2/M arrest, triggered apoptosis, impaired invasion, and reduced angiogenesis both in-vitro and in ex-vivo assays. Transcriptomic profiling, TCGA validation, and integrative pathway analysis identified key oncogenic hubs which converge on PI3K-AKT, MAPK/ERK, and YAP signalling pathways governing EMT. Inhibition of miR-1307-5p restored cisplatin sensitivity in resistant CSCs, with low-dose cisplatin producing substantial tumour suppression in-vitro and in-vivo. Reduced CD44 expression in xenograft models confirmed CSC reprogramming. EVs from anti-miR-treated cells confer chemo sensitisation upon uptake by resistant CSCs. Xenograft models substantiated that EVs can initiate tumour formation and that EV-mediated delivery of anti-miR-1307-5p drives significant tumour regression. ConclusionThis study identifies salivary EV-derived miR-1307-5p as a clinically relevant biomarker of chemoresistance in OSCC and reveals its mechanistic role in sustaining CSC-driven therapeutic failure. Targeting miR-1307-5p offers a promising avenue for restoring cisplatin sensitivity and developing exosome-based therapeutic strategies. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=150 SRC="FIGDIR/small/709730v1_ufig1.gif" ALT="Figure 1"> View larger version (38K): org.highwire.dtl.DTLVardef@19f88e0org.highwire.dtl.DTLVardef@d36b95org.highwire.dtl.DTLVardef@3c2579org.highwire.dtl.DTLVardef@c04ef5_HPS_FORMAT_FIGEXP M_FIG C_FIG

17
AI-Driven Reconstruction of the Research Paradigm for Phase Separation in Membraneless Organelle

ding, y.; lu, t.; Li, y.

2026-04-02 cell biology 10.64898/2026.03.31.715491 medRxiv
Top 0.5%
0.8%
Show abstract

Liquid-liquid phase separation (LLPS) of biomacromolecules is a key mechanism driving the formation of membraneless organelles (MLOs) within cells, playing a crucial role in fundamental biological processes such as cell proliferation and stress response. Accurately understanding and predicting the phase separation propensity of proteins is essential for unraveling the assembly mechanisms of MLOs and their functions under both physiological and pathological conditions. Traditional research methods primarily rely on biochemical experiments, which are limited by low throughput, high cost, and difficulty in systematically exploring sequence-phase transition relationships. This study proposes and implements a novel three-stage, iterative paradigm based on artificial intelligence (AI) to propel phase separation research towards systematization, predictability, and mechanistic understanding. O_LIBenchmark Model Construction: A preliminary predictive model was established based on a Multilayer Perceptron (MLP) neural network, and the driving effect of phenylalanine/tyrosine (F/Y) residue-mediated {pi}-{pi} interactions on LLPS was validated. C_LIO_LIModel Robustness Enhancement: The model was optimized through adversarial training strategies, which effectively identified and eliminated misclassifications of "highly disordered non-phase-separating" trap sequences. This significantly improved the models generalization capability and reliability when handling complex, real-world sequences. C_LIO_LIPhysical Mechanism Integration and Functional Expansion: Incorporating the Uniform Manifold Approximation and Projection (UMAP) manifold learning method and constraints from non-equilibrium thermodynamics, a "fingerprint space" capable of characterizing the thermodynamic behavior of phase separation was constructed. This space enables cluster analysis of different MLO types, and the model can output a thermodynamic stability score for protein phase separation. Based on this score, we identified 10 high-confidence candidate proteins with the potential to form novel MLOs. The paradigm established in this study upgrades phase separation prediction from the traditional "binary classification" approach to a novel research framework characterized by "physical mechanism analysis + novel MLO discovery." It provides the phase separation field with a computational tool that combines high accuracy, strong robustness, and good physical interpretability. C_LI

18
Exploration of the screening and regulatory mechanisms of biomarkers related to ac4C modification in laryngeal squamous cell carcinoma patients based on single-cell analysis and machine learning

Wang, L.; Gong, X.; Chen, D.; Chen, X.; Zhou, H.; Lan, J.; Ye, R.; Luo, Z.; Shi, Y.

2026-03-03 developmental biology 10.64898/2026.02.28.708684 medRxiv
Top 0.5%
0.8%
Show abstract

BackgroundN4-acetylcytidine (ac4C) modification plays a critical role in cancer development. Exploring ac4C modification in laryngeal squamous cell carcinoma (LSCC) may help elucidate its pathogenesis. MethodsLSCC-related datasets were obtained from GEO. After preprocessing and annotating single-cell data, malignant cells were identified by CNV scoring and further divided into subpopulations. Malignant epithelial cells (MECs) were identified and subclustered based on ac4C-related gene activity. Prognostic genes were screened using Cox regression and machine-learning approaches, followed by validation in clinical samples using qPCR. The biological and immunological relevance of these genes was further explored through immune infiltration, immunotherapy response, and mutation analyses. ResultsThe 14,465 identified MECs were classified into five subgroups (MEC1-5), among which MEC3 showed the strongest association with the ac4C gene set. Machine-learning analysis of MEC3-derived genes yielded seven prognostic markers, including BARX1, FHL2, NXPH4, PKMYT1, TNFAIP8L1, CRLF1, and CENPP. qPCR confirmed their differential expression between tumor and adjacent normal tissues. These genes were significantly associated with alterations in the tumor immune microenvironment, with high-risk patients showing increased immune infiltration and immune activity. ConclusionSeven ac4C-related prognostic genes were identified that may contribute to LSCC progression by modulating the tumor immune microenvironment, providing potential therapeutic insights.

19
A functional annotation based integration of different similarity measures for gene expressions

Misra, S.; Roy, S.; Ray, S. S.

2026-02-24 bioinformatics 10.64898/2026.02.23.707392 medRxiv
Top 0.5%
0.7%
Show abstract

Genes with similar expression profiles often exhibit similar functional properties. An "integrated similarity score" (ISS) is developed by combining different expression similarity measures through weights, obtained using biological information, for improving gene similarity. The expression similarity measures are converted to the common framework of positive predictive value using functional annotation. A fitness function, called "fitness function using functional annotation of genes" (FFFAG), is also developed by minimizing the difference between functional similarity value and the ISS. The FFFAG is used to determine the weight combination of different similarity measures in ISS. In addition, an existing similarity measure, called TMJ (integrated similarity measure by multiplying Triangle and Jaccard similarity), is also modified to incorporate biological knowledge involving functional annotation. The results demonstrate that ISS is superior to individual similarity measure to find similar gene pairs. Further, the ISS predicts the functional categories of 40 unclassified yeast genes at p-value cutoff of 10-10 from 12 clusters. The associated code is accessible at http://www.isical.ac.in/[~]shubhra/ISS.html.

20
Evaluation of degron motifs in Escherichia coli using a fluorescent reporter

Izert-Nowakowska, M. A.; Szybowska, P. E.; Klimecka, M. M.; Gorna, M. W.

2026-03-07 microbiology 10.64898/2026.03.07.710301 medRxiv
Top 0.5%
0.7%
Show abstract

Fluorescent reporters provide a useful tool for studying degron motifs. Fusing a degron of interest to a fluorescent protein allows to accurately track protein levels overtime to characterise the degradation kinetics of studied degrons. Here we describe a rapid and simple method to study degron peptides in Escherichia coli using plasmid-encoded eGFP-degron fusion constructs. The described methods provide an accessible workflow to evaluate degrons. We provide protocols for generation of pBAD plasmids encoding the studied constructs and two different methods for evaluating degrons - an end-point fluorescence measurement on agar plates and a kinetic measurement in liquid cultures in a 96-well format for high-throughput degron studies.